<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
    <head>
        <title>DSpace Documentation : DSpace Statistics</title>
	    <link rel="stylesheet" href="styles/site.css" type="text/css" />
        <META http-equiv="Content-Type" content="text/html; charset=UTF-8">	    
    </head>

    <body>
	    <table class="pagecontent" border="0" cellpadding="0" cellspacing="0" width="100%" bgcolor="#ffffff">
		    <tr>
			    <td valign="top" class="pagebody">
				    <div class="pageheader">
					    <span class="pagetitle">
                            DSpace Documentation : DSpace Statistics
                                                    </span>
				    </div>
				    <div class="pagesubheading">
					    This page last changed on Jan 14, 2011 by <font color="#0050B2">benbosman</font>.
				    </div>

				    <h1><a name="DSpaceStatistics-DSpaceStatistics"></a>DSpace Statistics</h1>

<p>DSpace 1.6 and newer versions uses the Apache SOLR application underlying the statistics. SOLR enables performant searching and adding to vast amounts of (usage) data.<br/>
Unlike previous versions, enabling statistics in DSpace does not require additional installation or customization. All the necessary software is included.</p>

<style type='text/css'>/*<![CDATA[*/
div.rbtoc1295025180569 {margin-left: 0px;padding: 0px;}
div.rbtoc1295025180569 ul {list-style: none;margin-left: 0px;}
div.rbtoc1295025180569 li {margin-left: 0px;padding-left: 0px;}

/*]]>*/</style><div class='rbtoc1295025180569'>
<ul>
    <li><span class='TOCOutline'>1</span> <a href='#DSpaceStatistics-Whatisexactlybeinglogged%3F'>What is exactly being logged ?</a></li>
    <li><span class='TOCOutline'>2</span> <a href='#DSpaceStatistics-WebuserinterfaceforDSpacestatistics'>Web user interface for DSpace statistics</a></li>
<ul>
    <li><span class='TOCOutline'>2.1</span> <a href='#DSpaceStatistics-Homepage'>Home page</a></li>
    <li><span class='TOCOutline'>2.2</span> <a href='#DSpaceStatistics-Communityhomepage'>Community home page</a></li>
    <li><span class='TOCOutline'>2.3</span> <a href='#DSpaceStatistics-Collectionhomepage'>Collection home page</a></li>
    <li><span class='TOCOutline'>2.4</span> <a href='#DSpaceStatistics-Itemhomepage'>Item home page</a></li>
</ul>
    <li><span class='TOCOutline'>3</span> <a href='#DSpaceStatistics-UsageEventLoggingandUsageStatisticsGathering'>Usage Event Logging and Usage Statistics Gathering</a></li>
    <li><span class='TOCOutline'>4</span> <a href='#DSpaceStatistics-ConfigurationsettingsforStatistics'>Configuration settings for Statistics</a></li>
<ul>
    <li><span class='TOCOutline'>4.1</span> <a href='#DSpaceStatistics-UpgradeProcessforStatistics.'>Upgrade Process for Statistics.</a></li>
</ul>
    <li><span class='TOCOutline'>5</span> <a href='#DSpaceStatistics-Oldersettingthatarenotrelatedtothenew1.6Statistics'>Older setting that are not related to the new 1.6 Statistics</a></li>
    <li><span class='TOCOutline'>6</span> <a href='#DSpaceStatistics-StatisticsAdministration'>Statistics Administration</a></li>
<ul>
    <li><span class='TOCOutline'>6.1</span> <a href='#DSpaceStatistics-ConvertingolderDSpacelogsintoSOLRusagedatahttps%3A%2F%2Fwiki.duraspace.org%2Fdisplay%2FDSDOC%2FSystemAdministration%23SystemAdministrationDSpaceLogConverter'> Converting older DSpace logs into SOLR usage data</a></li>
    <li><span class='TOCOutline'>6.2</span> <a href='#DSpaceStatistics-StatisticsClientUtilityhttps%3A%2F%2Fwiki.duraspace.org%2Fdisplay%2FDSDOC%2FSystemAdministration%23SystemAdministrationClientStatistics'> Statistics Client Utility</a></li>
</ul>
    <li><span class='TOCOutline'>7</span> <a href='#DSpaceStatistics-StatisticsdifferencesbetweenDSpace1.6.xand1.7.0'>Statistics differences between DSpace 1.6.x and 1.7.0</a></li>
<ul>
    <li><span class='TOCOutline'>7.1</span> <a href='#DSpaceStatistics-SOLRoptimizationadded'>SOLR optimization added</a></li>
    <li><span class='TOCOutline'>7.2</span> <a href='#DSpaceStatistics-SOLRAutocommit'>SOLR Autocommit</a></li>
</ul>
</ul></div>

<h2><a name="DSpaceStatistics-Whatisexactlybeinglogged%3F"></a>What is exactly being logged ?</h2>

<p>Each time a page or file gets requested, this request is being logged. The logging happens at the server side, and doesn't require a javascript like Google Analytics does, to provide usage data.</p>

<p>Definition of which fields are to be stored happens in the file dspace/solr/statistics/conf/schema.xml.<br/>
Some example fields, that can be stored per usage event, include:</p>

<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">&lt;field name=<span class="code-quote">"type"</span> type=<span class="code-quote">"integer"</span> indexed=<span class="code-quote">"<span class="code-keyword">true</span>"</span> stored=<span class="code-quote">"<span class="code-keyword">true</span>"</span> required=<span class="code-quote">"<span class="code-keyword">true</span>"</span> /&gt;
&lt;field name=<span class="code-quote">"id"</span> type=<span class="code-quote">"integer"</span> indexed=<span class="code-quote">"<span class="code-keyword">true</span>"</span> stored=<span class="code-quote">"<span class="code-keyword">true</span>"</span> required=<span class="code-quote">"<span class="code-keyword">true</span>"</span> /&gt;
&lt;field name=<span class="code-quote">"ip"</span> type=<span class="code-quote">"string"</span> indexed=<span class="code-quote">"<span class="code-keyword">true</span>"</span> stored=<span class="code-quote">"<span class="code-keyword">true</span>"</span> required=<span class="code-quote">"<span class="code-keyword">false</span>"</span> /&gt;
&lt;field name=<span class="code-quote">"time"</span> type=<span class="code-quote">"date"</span> indexed=<span class="code-quote">"<span class="code-keyword">true</span>"</span> stored=<span class="code-quote">"<span class="code-keyword">true</span>"</span> required=<span class="code-quote">"<span class="code-keyword">true</span>"</span> /&gt;
&lt;field name=<span class="code-quote">"epersonid"</span> type=<span class="code-quote">"integer"</span> indexed=<span class="code-quote">"<span class="code-keyword">true</span>"</span> stored=<span class="code-quote">"<span class="code-keyword">true</span>"</span> required=<span class="code-quote">"<span class="code-keyword">false</span>"</span> /&gt;
&lt;field name=<span class="code-quote">"country"</span> type=<span class="code-quote">"string"</span> indexed=<span class="code-quote">"<span class="code-keyword">true</span>"</span> stored=<span class="code-quote">"<span class="code-keyword">true</span>"</span> required=<span class="code-quote">"<span class="code-keyword">false</span>"</span> /&gt;
&lt;field name=<span class="code-quote">"city"</span> type=<span class="code-quote">"string"</span> indexed=<span class="code-quote">"<span class="code-keyword">true</span>"</span> stored=<span class="code-quote">"<span class="code-keyword">true</span>"</span> required=<span class="code-quote">"<span class="code-keyword">false</span>"</span>/&gt;
&lt;field name=<span class="code-quote">"owningComm"</span> type=<span class="code-quote">"integer"</span> indexed=<span class="code-quote">"<span class="code-keyword">true</span>"</span> stored=<span class="code-quote">"<span class="code-keyword">true</span>"</span> required=<span class="code-quote">"<span class="code-keyword">false</span>"</span> multiValued=<span class="code-quote">"<span class="code-keyword">true</span>"</span> /&gt;
</pre>
</div></div>

<p>The combination of <a href="https://wiki.duraspace.org/display/DSDOC/Business+Logic+Layer#BusinessLogicLayer-Constants">type</a> and id determine which resource (either community, collection, item page or file download) has been requested.</p>

<h2><a name="DSpaceStatistics-WebuserinterfaceforDSpacestatistics"></a>Web user interface for DSpace statistics</h2>

<p>In the XMLUI, statistics can be accessed from the lower end of the navigation menu. In the JSPUI, a view statistics button appears on the bottom of pages for which statistics are available.</p>

<p>If you are not seeing these links or buttons, it's likely that they are only enabled for administrators in your installation. Change the configuration parameter "statistics.item.authorization.admin" to false in order to make statistics visible for all repository visitors.</p>

<h3><a name="DSpaceStatistics-Homepage"></a>Home page</h3>

<p>Starting from the repository homepage, the statistics page displays the top 10 most popular items of the entire repository.</p>

<h3><a name="DSpaceStatistics-Communityhomepage"></a>Community home page</h3>

<p>The following statistics are available for the community home pages:</p>
<ul>
	<li>Total visits of the current community home page</li>
	<li>Visits of the community home page over a timespan of the last 7 months</li>
	<li>Top 10 country from where the visits originate</li>
	<li>Top 10 cities from where the visits originate</li>
</ul>


<h3><a name="DSpaceStatistics-Collectionhomepage"></a>Collection home page</h3>

<p>The following statistics are available for the collection home pages:</p>
<ul>
	<li>Total visits of the current collection home page</li>
	<li>Visits of the collection home over a timespan of the last 7 months</li>
	<li>Top 10 country from where the visits originate</li>
	<li>Top 10 cities from where the visits originate</li>
</ul>


<h3><a name="DSpaceStatistics-Itemhomepage"></a>Item home page</h3>

<p>The following statistics are available for the item home pages:</p>
<ul>
	<li>Total visits of the item</li>
	<li>Total visits for the bitstreams attached to the item</li>
	<li>Visits of the item over a timespan of the last 7 months</li>
	<li>Top 10 country views from where the visits originate</li>
	<li>Top 10 cities from where the visits originate</li>
</ul>


<h2><a name="DSpaceStatistics-UsageEventLoggingandUsageStatisticsGathering"></a>Usage Event Logging and Usage Statistics Gathering</h2>

<p>The DSpace Statistics Implementation is a Client/Server architecture based on Solr for collecting usage events in the JSPUI and XMLUI user interface applications of DSpace.&nbsp; Solr runs as a separate webapplication and an instance of Apache Http Client is utilized to allow parallel requests to log statistics events into this Solr instance.&nbsp;</p>

<h2><a name="DSpaceStatistics-ConfigurationsettingsforStatistics"></a>Configuration settings for Statistics</h2>

<p>In the dspace.cfg file review the following fields to make sure they are uncommented:</p>

<div class='table-wrap'>
<table class='confluenceTable'><tbody>
<tr>
<td class='confluenceTd'> Property: </td>
<td class='confluenceTd'> solr.log.server </td>
</tr>
<tr>
<td class='confluenceTd'> Example Value: </td>
<td class='confluenceTd'> solr.log.server = <a href="http://127.0.0.1/solr/statistics">http://127.0.0.1/solr/statistics</a> <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Informational Note: </td>
<td class='confluenceTd'> Is used by the SolrLogger Client class to connect to the Solr server over http and perform updates and queries. In most cases, this can (and should) be set to localhost (or 127.0.0.1). <br class="atl-forced-newline" />
<br class="atl-forced-newline" />
To determine the correct path, you can use a tool like <tt>wget</tt> to see where Solr is responding on your server.  For example, you'd want to send a query to Solr like the following: <br class="atl-forced-newline" />
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">wget http:<span class="code-comment">//127.0.0.1/solr/statistics/select?q=*:*</span></pre>
</div></div>
<p>Assuming you get an HTTP 200 OK response, then you should set <tt>solr.log.server</tt> to the '/statistics' URL of 'http://127.0.0.1/solr/statistics' (essentially removing the "/select?q=<b>:</b>" query off the end of the responding URL.) </p></td>
</tr>
<tr>
<td class='confluenceTd'> Property: </td>
<td class='confluenceTd'> solr.spiderips.urls <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Example Value: </td>
<td class='confluenceTd'> solr.spiderips.urls  = 
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">http:<span class="code-comment">//iplists.com/google.txt, \
</span>http:<span class="code-comment">//iplists.com/inktomi.txt, \
</span>http:<span class="code-comment">//iplists.com/lycos.txt, \
</span>http:<span class="code-comment">//iplists.com/infoseek.txt, \
</span>http:<span class="code-comment">//iplists.com/altavista.txt, \
</span>http:<span class="code-comment">//iplists.com/excite.txt, \
</span>http:<span class="code-comment">//iplists.com/misc.txt, \
</span>http:<span class="code-comment">//iplists.com/non_engines.txt</span>
</pre>
</div></div>
<p> <br class="atl-forced-newline" /> </p></td>
</tr>
<tr>
<td class='confluenceTd'> Informational Note: </td>
<td class='confluenceTd'> List of URLs to download spiders files into [dspace]/config/spiders. These files contain lists of known spider IPs and are utilized by the SolrLogger to flag usage events with an "isBot" field, or ignore them entirely. <br class="atl-forced-newline" />
<br class="atl-forced-newline" />
The "stats-util" command can be used to force an update of spider files, regenerate "isBot" fields on indexed events, and delete spiders from the index. For usage, run: <br class="atl-forced-newline" />
<br class="atl-forced-newline" />
<br class="atl-forced-newline" />  
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">dspace stats-util -h
</pre>
</div></div>
<p>from your&nbsp;[dspace]/bin directory <br class="atl-forced-newline" /> </p></td>
</tr>
<tr>
<td class='confluenceTd'> Property: </td>
<td class='confluenceTd'> solr.dbfile <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Example Value: </td>
<td class='confluenceTd'> solr.dbfile = ${dspace.dir}/config/GeoLiteCity.dat <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Informational Note: </td>
<td class='confluenceTd'> The following referes to the GeoLiteCity database file utilized by the LocationUtils to calculate the location of client requests based on IP address. During the Ant build process (both fresh_install and update) this file will be downloaded from <a href="http://www.maxmind.com/app/geolitecity">http://www.maxmind.com/app/geolitecity</a> if a new version has been published or it is absent from your [dspace]/config directory. <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Property: </td>
<td class='confluenceTd'> solr.resolver.timeout <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Example Value: </td>
<td class='confluenceTd'> solr.resolver.timeout = 200 <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Informational Note: </td>
<td class='confluenceTd'> Timeout in milliseconds for DNS resolution of origin hosts/IPs. Setting this value too high may result in solr exhausting your connection pool. <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Property: </td>
<td class='confluenceTd'> useProxies <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Example Value: </td>
<td class='confluenceTd'> useProxies = true <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Informational Note: </td>
<td class='confluenceTd'> Will cause Statistics logging to look for X-Forward URI to detect clients IP that have accessed it through a Proxy service (e.g. the Apache mod_proxy).&nbsp; Allows detection of client IP when accessing DSpace. [Note: This setting is found in the DSpace Logging section of dspace.cfg] <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Property: </td>
<td class='confluenceTd'> statistics.item.authorization.admin <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Example Value: </td>
<td class='confluenceTd'> statistics.item.authorization.admin = true <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Informational Note: </td>
<td class='confluenceTd'> When set to true, only general administrators, collection and community administrators are able to access the statistics from the web user interface. As a result, the links to access statistics are hidden for non logged-in admin users. Setting this property to "false" will display the links to access statistics to anyone, making them publicly available. <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Property: </td>
<td class='confluenceTd'> solr.statistics.logBots <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Example Value: </td>
<td class='confluenceTd'> solr.statistics.logBots = true <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Informational Note: </td>
<td class='confluenceTd'> When this property is set to false, and IP is detected as a spider, the event is not logged. <br class="atl-forced-newline" />
When this property is set to true, the event will be logged with the "isBot" field set to true. <br class="atl-forced-newline" />
(see solr.statistics.query.filter.&#42; for query filter options) <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Property: </td>
<td class='confluenceTd'> solr.statistics.query.filter.spiderIp <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Example Value: </td>
<td class='confluenceTd'> solr.statistics.query.filter.spiderIp = false <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Informational Note: </td>
<td class='confluenceTd'> If true, statistics queries will filter out spider IPs &#45;&#45; use with caution, as this often results in extremely long query strings. <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Property: </td>
<td class='confluenceTd'> solr.statistics.query.filter.isBot <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Example Value: </td>
<td class='confluenceTd'> solr.statistics.query.filter.isBot = true <br class="atl-forced-newline" /> </td>
</tr>
<tr>
<td class='confluenceTd'> Informational Note: </td>
<td class='confluenceTd'> If true, statistics queries will filter out events flagged with the "isBot" field. This is the recommended method of filtering spiders from statistics. <br class="atl-forced-newline" /> </td>
</tr>
</tbody></table>
</div>


<h3><a name="DSpaceStatistics-UpgradeProcessforStatistics."></a>Upgrade Process for Statistics.</h3>

<p>Example of rebuild and redeploy DSpace (only if you have configured your distribution in this manner)</p>

<p>First approach the traditional DSpace build process for updating</p>

<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java"> cd [dspace-source]/dspace
 mvn <span class="code-keyword">package</span>
 cd [dspace-source]/dspace/target/dspace-&lt;version&gt;-build.dir
 ant -Dconfig=[dspace]/config/dspace.cfg update
 cp -R [dspace]/webapps/* [TOMCAT]/webapps
</pre>
</div></div>

<p>The last step is only used if you are not mounting <em>[dspace]/webapps</em> directly into your Tomcat, Resin or Jetty host (the recommended practice)If you only need to build the statistics, and don't make any changes to other web applications, you can replace the copy step above with:</p>

<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java"> cp -R dspace/webapps/solr TOMCAT/webapps
</pre>
</div></div>
<p><em>Again, only if you are not mounting [dspace]/webapps directly into your Tomcat, Resin or Jetty host (the recommended practice)</em></p>

<p>Restart your webapps (Tomcat/Jetty/Resin)</p>

<h2><a name="DSpaceStatistics-Oldersettingthatarenotrelatedtothenew1.6Statistics"></a>Older setting that are not related to the new 1.6 Statistics</h2>

<p>The following Dspace.cfg fields are only applicable to the older statistics solution.</p>

<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java"> ###### Statistical Report Configuration Settings ######

 # should the stats be publicly available?  should be set to <span class="code-keyword">false</span> <span class="code-keyword">if</span> you only
 # want administrators to access the stats, or you <span class="code-keyword">do</span> not intend to generate
 # any
 report.<span class="code-keyword">public</span> = <span class="code-keyword">false</span>

 # directory where live reports are stored
 report.dir = ${dspace.dir}/reports/
</pre>
</div></div>

<p>These fields are not used by the new 1.6 Statistics, but are only related to the Statistics from previous DSpace releases</p>

<h2><a name="DSpaceStatistics-StatisticsAdministration"></a>Statistics Administration</h2>

<h3><a name="DSpaceStatistics-ConvertingolderDSpacelogsintoSOLRusagedatahttps%3A%2F%2Fwiki.duraspace.org%2Fdisplay%2FDSDOC%2FSystemAdministration%23SystemAdministrationDSpaceLogConverter"></a><a href="https://wiki.duraspace.org/display/DSDOC/System+Administration#SystemAdministration-DSpaceLogConverter">Converting older DSpace logs into SOLR usage data</a></h3>

<p>If you have upgraded from a previous version of DSpace, converting older log files ensures that you carry over older usage stats from before the upgrade.</p>

<h3><a name="DSpaceStatistics-StatisticsClientUtilityhttps%3A%2F%2Fwiki.duraspace.org%2Fdisplay%2FDSDOC%2FSystemAdministration%23SystemAdministrationClientStatistics"></a><a href="https://wiki.duraspace.org/display/DSDOC/System+Administration#SystemAdministration-ClientStatistics">Statistics Client Utility</a></h3>

<p>The command line interface (CLI) scripts can be used to clean the usage database from additional spider traffic and other maintenance tasks.</p>

<h2><a name="DSpaceStatistics-StatisticsdifferencesbetweenDSpace1.6.xand1.7.0"></a>Statistics differences between DSpace 1.6.x and 1.7.0</h2>

<h3><a name="DSpaceStatistics-SOLRoptimizationadded"></a>SOLR optimization added</h3>

<p>If required, the solr server can be optimized by running</p>
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">{dspace.dir}/bin/stats-util -o
</pre>
</div></div>
<p>. More information on how these solr server optimizations work can be found here: <a href="http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations">http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations</a>.</p>

<h3><a name="DSpaceStatistics-SOLRAutocommit"></a>SOLR Autocommit</h3>

<p>In DSpace 1.6.x, each solr event was committed to the solr server individually. For high load DSpace installations, this would result in a huge load of small solr commits resulting in a very high load on the solr server.<br/>
This has been resolved in dspace 1.7 by only committing usage events to the solr server every 15 minutes. This will result in a delay of the storage of a usage event of maximum 15 minutes. If required, this value can be altered by changing the maxTime property in the</p>
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">{dspace.dir}/solr/statistics/conf/solrconfig.xml.
</pre>
</div></div>

				    
                    			    </td>
		    </tr>
	    </table>
	    <table border="0" cellpadding="0" cellspacing="0" width="100%">
			<tr>
				<td height="12" background="https://wiki.duraspace.org/images/border/border_bottom.gif"><img src="images/border/spacer.gif" width="1" height="1" border="0"/></td>
			</tr>
		    <tr>
			    <td align="center"><font color="grey">Document generated by Confluence on Mar 25, 2011 19:21</font></td>
		    </tr>
	    </table>
    </body>
</html>